Discourse Segmentation in Aid of Document Summarization
نویسندگان
چکیده
This paper describes work to enhance a sentencebased summarizer with notions of salience, dynamicallyadjustable summary size, discourse segmentation, and awareness of topic shifts. Our experiments study strategies to diversify the application of a baseline summarizer, by making it aware of finer-grained ‘aboutness’, capable of discerning changes of topic, and sensitive to longer-thanusual documents. Evaluated against the corpus used in the development of the baseline summarizer, summaries derived either by means of segmentation analysis alone, or by a mix of strategies for combining salience calculation and topic shift detection, are shown to be of comparable, and under certain conditions even better, quality. We describe the summarization and segmentation procedures, outline a number of strategies for mixing the two, evaluate the overall impact of discourse segmentation, and suggest an interface design capable of using the notion of topic shifts to contextualize a summary and facilitate the mediation between it and the full document source.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملThai News Text Summarization and Its Application
Since Thai language lacks word/phrase/sentence boundaries, document summarization in Thai needs investigations in unit segmentation, unit selection, redundancy removal and evaluation dataset construction. In this work, we have proposed Thai Elementary Discourse Unit (TEDU) and a three-stage method of Thai multidocument summarization, i.e., unit segmentation, unit-graph formulation, and unit sel...
متن کاملThai Multi-Document Summarization: Unit Segmentation, Unit-Graph Formulation, and Unit Selection
There have been several challenges in summarization of Thai multiple documents since Thai language itself lacks of explicit word/phrase/sentence boundaries. This paper gives definition of Thai Elementary Discourse Unit (TEDU) and then presents our three-stage summarization process. Towards implementation of this process, we propose unit segmentation using TEDUs and their derivatives, unitgraph ...
متن کاملLexical cohesion, discourse segmentation and document summarization
Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...
متن کاملDiscourse Segmentation for Sentence Compression
Earlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization systems. However, there are few studies that consider sentence-level discourse segmentation for compression task; to our knowledge, none in Spa...
متن کامل